Handling Multiword Expressions in Phrase-Based Statistical Machine Translation

نویسندگان

  • Santanu Pal
  • Tanmoy Chakraborty
  • Sivaji Bandyopadhyay
چکیده

Preprocessing of the parallel corpus plays an important role in improving the performance of a phrase-based statistical machine translation (PB-SMT). In this paper, we propose a frame work in which predefined information of Multiword Expressions (MWEs) can boost the performance of PB-SMT. We preprocess the parallel corpus to identify Noun-noun MWEs, reduplicated phrases, complex predicates and phrasal prepositions. Singletokenization of Noun-noun MWEs, phrasal preposition (source side only) and reduplicated phrases (target side only) provide significant gains over our previous best PBSMT model. Automatic alignment of complex predicates substantially improves the overall MT performance and the word alignment quality as well. For establishing NE alignments, we transliterate source NEs into the target language and then compare them with the target NEs. Target language NEs are first converted into a canonical form before the comparison takes place. The proposed system achieves significant improvements (6.38 BLEU points absolute, 73% relative improvement) over the baseline system on an EnglishBengali translation task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions

Multiword expressions (MWEs) have been proved useful for many natural language processing tasks. However, how to use them to improve performance of statistical machine translation (SMT) is not well studied. This paper presents a simple yet effective strategy to extract domain bilingual multiword expressions. In addition, we implement three methods to integrate bilingual MWEs to Moses, the state...

متن کامل

Integration of Reduplicated Multiword Expressions and Named Entities in a Phrase Based Statistical Machine Translation System

The language specific Multiword expressions (MWEs) play important roles in many natural language processing (NLP) tasks. Integrating reduplicated multiword expressions (RMWEs) into the Phrase Based Statistical Machine Translation (PBSMT) to improve translation quality is reported in the present work between Manipuri, a highly agglutinative Tibeto-Burman language and English. In addition, Multiw...

متن کامل

Identifying bilingual Multi-Word Expressions for Statistical Machine Translation

MultiWord Expressions (MWEs) repesent a key issue for numerous applications in Natural Language Processing (NLP) especially for Machine Translation (MT). In this paper, we describe a strategy for detecting translation pairs of MWEs in a French-English parallel corpus. In addition we introduce three methods aiming to integrate extracted bilingual MWES in MOSES, a phrase based Statistical Machine...

متن کامل

Multiword Expressions in Machine Translation

This work describes an experimental evaluation of the significance of phrasal verb treatment for obtaining better quality statistical machine translation (SMT) results. The importance of the detection and special treatment of phrasal verbs is measured in the context of SMT, where the word-for-word translation of these units often produces incoherent results. Two ways of integrating phrasal verb...

متن کامل

Promoting Flexible Translations in Statistical Machine Translation

While SMT systems can learn to translate multiword expressions (MWEs) from parallel text, they typically have no notion of non-compositionality, and thus overgeneralise translations that are only used in certain contexts. This paper describes a novel approach to measure the flexibility of a phrase pair, i.e. its tendency to occur in many contexts, in contrast to phrase pairs that are only valid...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011